Turning DTDs into specialized tree-automata-based schemata to match a collection of marked-up documents

نویسندگان

  • Rafael C. Carrasco
  • Alejandro Bia
  • Mikel L. Forcada
  • Pedro M. Pérez-Antón
چکیده

Regular tree automata (RTA) or, equivalently, forest regular grammars (FRG) have been re ently proposed for use as XML (extended markup language) s hemata. They are more powerful than usual XML DTDs (do ument-type definitions), make the implementation, optimization and pruning of XML queries easier and allow for the implementation of ontext-sensitive ontent models. We des ribe a method for the automati generation of a spe ialized RTA-based s hema from a sour e DTD and a sample of marked-up do uments showing ontext-sensitive behaviour in ontent models. It reates the smallest RTA-based s hema with whi h all the XML do uments in the sample omply and whi h does not a ept any do uments not valid a ording to the original DTD. In this way, new les an be reated, parsed, and queried using the spe ialized s hema but still be ompliant with the original DTD. The tool is urrently being tested at the Miguel de Cervantes digital library at the University of Ala ant (http:// ervantesvirtual. om).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Structural Similarity in XML Documents

XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents...

متن کامل

Using Regular Tree Automata as XML Schemas

We address the problem of tight XML schemas and propose regular tree automata to model XML data. We show that the tree automata model is more powerful that the XML DTDs and is closed under main algebraic operations. We introduce the XML query algebra based the tree automata model, and discuss the query optimization and query pruning techniques. Finally, we show the conversion of tree automata s...

متن کامل

Efficient inclusion checking for deterministic tree automata and XML Schemas

We present algorithms for testing language inclusion L(A) ⊆ L(B) between tree automata in time O(|A| · |B|) where B is deterministic (bottom-up or top-down). We extend our algorithms for testing inclusion of automata for unranked trees A in deterministic DTDs or deterministic EDTDs with restrained competition D in time O(|A| · |Σ| · |D|). Previous algorithms were less efficient or less general.

متن کامل

TREE AUTOMATA BASED ON COMPLETE RESIDUATED LATTICE-VALUED LOGIC: REDUCTION ALGORITHM AND DECISION PROBLEMS

In this paper, at first we define the concepts of response function and accessible states of a complete residuated lattice-valued (for simplicity we write $mathcal{L}$-valued) tree automaton with a threshold $c.$ Then, related to these concepts, we prove some lemmas and theorems that are applied in considering some decision problems such as finiteness-value and emptiness-value of recognizable t...

متن کامل

DTD-driven bilingual document generation

Extensively annotated bilingual parallel corpora can be exploited to feed editing tools that integrate the processes of document composition and translation. Here we discuss the architecture of an interactive editing tool that, on top of techniques common to most Translation Memory-based systems, applies the potential of SGML's DTDs to guide the process of bilingual document generation. Rather ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001